Current Issue : October - December Volume : 2015 Issue Number : 4 Articles : 6 Articles
Background: Improvements in sequencing technology now allow easy acquisition of large datasets; however,\nanalyzing these data for phylogenetics can be challenging. We have developed a novel method to rapidly obtain\nhomologous genomic data for phylogenetics directly from next-generation sequencing reads without the use of a\nreference genome. This software, called SISRS, avoids the time consuming steps of de novo whole genome assembly,\nmultiple genome alignment, and annotation.\nResults: For simulations SISRS is able to identify large numbers of loci containing variable sites with phylogenetic\nsignal. For genomic data from apes, SISRS identified thousands of variable sites, from which we produced an accurate\nphylogeny. Finally, we used SISRS to identify phylogenetic markers that we used to estimate the phylogeny of\nplacental mammals. We recovered eight phylogenies that resolved the basal relationships among mammals using\ndatasets with different levels of missing data. The three alternate resolutions of the basal relationships are consistent\nwith the major hypotheses for the relationships among mammals, all of which have been supported previously by\ndifferent molecular datasets.\nConclusions: SISRS has the potential to transform phylogenetic research. This method eliminates the need for\nexpensive marker development in many studies by using whole genome shotgun sequence data directly. SISRS is\nopen source and freely available at https://github.com/rachelss/SISRS/releases....
Background: We consider data from a time course microarray experiment that was conducted on grapevines over\nthe development cycle of the grape berries at two different vineyards in South Australia. Although the underlying\nbiological process of berry development is the same at both vineyards, there are differences in the timing of the\ndevelopment due to local conditions. We aim to align the data from the two vineyards to enable an integrated analysis\nof the gene expression and use the alignment of the expression profiles to classify likely developmental function.\nResults: We present a novel alignment method based on hidden Markov models (HMMs) and use the method to\nalign the motivating grapevine data. We show that our alignment method is robust against subsets of profiles that are\nnot suitable for alignment, investigate alignment diagnostics under the model and demonstrate the classification of\ndevelopmentally driven genes.\nConclusions: The classification of developmentally driven genes both validates that the alignment we obtain is\nmeaningful and also gives new evidence that can be used to identify the role of genes with unknown function. Using\nour alignment methodology, we find at least 1279 grapevine probe sets with no current annotated function that are\nlikely to be controlled in a developmental manner....
Background: One aspect in which RNA sequencing is more valuable than microarray-based methods is the ability\nto examine the allelic imbalance of the expression of a gene. This process is often a complex task that entails quality\ncontrol, alignment, and the counting of reads over heterozygous single-nucleotide polymorphisms. Allelic imbalance\nanalysis is subject to technical biases, due to differences in the sequences of the measured alleles. Flexible bioinformatics\ntools are needed to ease the workflow while retaining as much RNA sequencing information as possible throughout the\nanalysis to detect and address the possible biases.\nResults: We present AllelicImblance, a software program that is designed to detect, manage, and visualize allelic\nimbalances comprehensively. The purpose of this software is to allow users to pose genetic questions in any RNA\nsequencing experiment quickly, enhancing the general utility of RNA sequencing. The visualization features can reveal\nnotable, non-trivial allelic imbalance behavior over specific regions, such as exons.\nConclusions: The software provides a complete framework to perform allelic imbalance analyses of aligned RNA\nsequencing data, from detection to visualization, within the robust and versatile management class, ASEset....
Background: Class prediction models have been shown to have varying performances in clinical gene expression\ndatasets. Previous evaluation studies, mostly done in the field of cancer, showed that the accuracy of class\nprediction models differs from dataset to dataset and depends on the type of classification function. While a\nsubstantial amount of information is known about the characteristics of classification functions, little has been done\nto determine which characteristics of gene expression data have impact on the performance of a classifier. This\nstudy aims to empirically identify data characteristics that affect the predictive accuracy of classification models,\noutside of the field of cancer.\nResults: Datasets from twenty five studies meeting predefined inclusion and exclusion criteria were downloaded.\nNine classification functions were chosen, falling within the categories: discriminant analyses or Bayes classifiers, tree\nbased, regularization and shrinkage and nearest neighbors methods. Consequently, nine class prediction models\nwere built for each dataset using the same procedure and their performances were evaluated by calculating their\naccuracies. The characteristics of each experiment were recorded, (i.e., observed disease, medical question, tissue/\ncell types and sample size) together with characteristics of the gene expression data, namely the number of\ndifferentially expressed genes, the fold changes and the within-class correlations. Their effects on the accuracy of a\nclass prediction model were statistically assessed by random effects logistic regression. The number of differentially\nexpressed genes and the average fold change had significant impact on the accuracy of a classification model and\ngave individual explained-variation in prediction accuracy of up to 72% and 57%, respectively. Multivariable random\neffects logistic regression with forward selection yielded the two aforementioned study factors and the within class\ncorrelation as factors affecting the accuracy of classification functions, explaining 91.5% of the between study\nvariation.\nConclusions: We evaluated study- and data-related factors that might explain the varying performances of\nclassification functions in non-cancerous datasets. Our results showed that the number of differentially expressed\ngenes, the fold change, and the correlation in gene expression data significantly affect the accuracy of class\nprediction models....
Background: The rapid pace of bioscience research makes it very challenging to track relevant articles in one�s area\nof interest. MEDLINE, a primary source for biomedical literature, offers access to more than 20 million citations with\nthree-quarters of a million new ones added each year. Thus it is not surprising to see active research in building\nnew document retrieval and sentence retrieval systems. We present Ferret, a prototype retrieval system, designed\nto retrieve and rank sentences (and their documents) conveying gene-centric relationships of interest to a scientist.\nThe prototype has several features. For example, it is designed to handle gene name ambiguity and perform query\nexpansion. Inputs can be a list of genes with an optional list of keywords. Sentences are retrieved across species\nbut the species discussed in the records are identified. Results are presented in the form of a heat map and sentences\ncorresponding to specific cells of the heat map may be selected for display. Ferret is designed to assist bio scientists at\ndifferent stages of research from early idea exploration to advanced analysis of results from bench experiments.\nResults: Three live case studies in the field of plant biology are presented related to Arabidopsis thaliana. The first is to\ndiscover genes that may relate to the phenotype of open immature flower in Arabidopsis. The second case is about\nfinding associations reported between ethylene signaling and a set of 300+ Arabidopsis genes. The third case is on\nsearching for potential gene targets of an Arabidopsis transcription factor hypothesized to be involved in plant stress\nresponses. Ferret was successful in finding valuable information in all three cases. In the first case the bZIP family of\ngenes was identified. In the second case sentences indicating relevant associations were found in other species such as\npotato and jasmine. In the third sentences led to new research questions about the plant hormone salicylic acid.\nConclusions: Ferret successfully retrieved relevant gene-centric sentences from PubMed records. The three case studies\ndemonstrate end user satisfaction with the system....
Background: Understanding living systems is crucial for curing diseases. To achieve this task we have to understand\nbiological networks based on protein-protein interactions. Bioinformatics has come up with a great amount of\ndatabases and tools that support analysts in exploring protein-protein interactions on an integrated level for\nknowledge discovery. They provide predictions and correlations, indicate possibilities for future experimental research\nand fill the gaps to complete the picture of biochemical processes. There are numerous and huge databases of\nprotein-protein interactions used to gain insights into answering some of the many questions of systems biology.\nMany computational resources integrate interaction data with additional information on molecular background.\nHowever, the vast number of diverse Bioinformatics resources poses an obstacle to the goal of understanding. We\npresent a survey of databases that enable the visual analysis of protein networks.\nResults: We selected M= 10 out of N= 53 resources supporting visualization, and we tested against the following\nset of criteria: interoperability, data integration, quantity of possible interactions, data visualization quality and data\ncoverage. The study reveals differences in usability, visualization features and quality as well as the quantity of\ninteractions. StringDB is the recommended first choice. CPDB presents a comprehensive dataset and IntAct lets the\nuser change the network layout. A comprehensive comparison table is available via web. The supplementary table\ncan be accessed on http://tinyurl.com/PPI-DB-Comparison-2015.\nConclusions: Only some web resources featuring graph visualization can be successfully applied to interactive visual\nanalysis of protein-protein interaction. Study results underline the necessity for further enhancements of visualization\nintegration in biochemical analysis tools. Identified challenges are data comprehensiveness, confidence, interactive\nfeature and visualization maturing....
Loading....